Introduction

We will be looking at visualizations and analysis of data. We will be mapping the data to look for patterns as well as running some RDAs. The data in this project was obtained from the Northwest Fisheries Science Center (NWFSC). NOAA’s public access data inventory is a good place to look if you are wanting your own data but you will have to look by location. I choose to look at project’s within the Puget Sound area due to Seattle being my home base. The portion of the data my project will use is the Microbe and Nutrients data. The data was provided by Linda Rhodes who was studying the health of the pelagic food web in the Puget Sound in April-October of 2011.

The full data set and metadata: https://www.webapps.nwfsc.noaa.gov/apex/parrdata/inventory/tables/table/epa2011_physical_nutrients_microbes

Note Prior to uploading the CSV due to 52 columns of data and all headings being untidily labeled it was easier to remove in the CSV file than it was to tidy but tidying will also be shown below.

Data Definitions

Nutrients (Micromolar):

Nitrite: NO2

Nitrate: NO3

Ammonium: NH4

Phosphate: PO4

Silicate: SiOH4

Bacteria (cells/mL):

High Nucleic Acid Bacteria

Low Nucleic Acid Bacteria

Total Bacteria

Hypothesis: Nitrogen based nutrients will have a higher impact on bacteria.

Let’s get started

1.Set Working Directory

ex: setwd(“~/MARE375/MARE375Final/MARE375Final”) or go to Session -> Set Working Directory

2. Import Data

Install all necessary packages by install.packages(packagename) there are many through this

example: install.packages(readr)

library(readr)
mic <- read_csv("EPA2011Microbial_Nutrient.csv")
## Rows: 463 Columns: 15── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (15): ID, lat, long, temp, NO3, NO2, NH4, PO4, SiOH4, total_bacteria, hi...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
##View(mic)

You can also call your data by clicking your data set in Files -> Import Dataset… -> Import

You may have seen there is a code box above the Import button. You can grab the code from this area and put it in your mark down as I did above.

3.Looking over our data using ‘View(mic)’:

There are some missing values. There was no indication in the metadata why there were blanks. There are other areas that provide a 0 for their data so this bacteria may not be present in those areas and a 0 was just forgotten. In our learning purposes we can either put 0’s in, perform imputation, or remove the data. Here we are going to put 0’s.

mic[is.na(mic)] = 0
##View(mic)

Plotting the Data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ stringr 1.4.0
## ✔ tidyr   1.2.0     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
mic1 <- mic %>% select ( -lat, -long, -ID, -temp)
##View(mic1)

source("panelutils.R") ## this source is needed to run the bivariate plot, it's not a package

pairs(mic1, panel = panel.smooth, diag.panel = panel.hist, main = "Bivariate Plot with Histograms and Smooth Curves", family= "Times")

It doesn’t look like there were any strong linear relationship between nutrients and microbes in general. We would expect there to be some significance between bacteria and nutrients due to their role in the food web in breaking down organic matter and releasing nutrients back into the environment. We should continue to look into this area for further analysis.

Split up the Dataset Into Variables

Split up our data in categories we want to look at:

Nutrients = nut

Bacteria = bac

Site Location = site

nut <- mic %>% select (NO2, NO3, NH4, PO4, SiOH4)
##View(nut)

bac <- mic %>% select (total_bacteria, low_nucleic_acid_bacteria, high_nucleic_acid_bacteria)
##View(bac)

site <- mic %>% select( "long", "lat")
##View(site)

Mapping Data

Orienting yourself to your site on a map is a good visual starter. If you upload your latitude and longitude into Tableau you can get a map with your own sites labeled. You can choose from a variety of backgrounds such as gray and white or satellite as seen here.

Note: If you run into problems with Tableau causing issues with automatically using (SUM)Latitude and (SUM)Longitude instead of individual points turn off Aggregate Measures in the Analysis menu.

We can number the sites to keep track of them area. On this map the longitudinal scale is exaggerated which will help us see the numbers better but the pattern is still the same.

plot(site, type = "n", main = "Site Locations Numbered", xlab = "Longitude", ylab = "Latitude" ,family = "Times")
text(site, row.names(site), cex = 0.8, col = "darkslateblue" , family = "Times")
text(-123.0, 48.5, "Puget Sound , WA", cex = 1.2, col = "Red") #adding labels to identify areas due to longitudinal exaggeration
text(-122.72, 48.74, "Bellingham Area", cex = 0.8, col = "Black")
text(-122.27, 48.2, "Everett Area", cex = 0.8, col = "Black")
text(-122.27, 47.59, "Seattle Area", cex = 0.8, col = "Black")
text(-122.52, 47.7, "Bremerton Area", cex = 0.8, col = "Black")
text(-122.34, 47.29, "Tacoma Area", cex = 0.8, col = "Black")
text(-122.9, 47.29, "Olympia Area", cex = 0.8, col = "Black")
text(-123.09, 47.27, "Shelton Area", cex = 0.8, col = "Black")
text(-123.0, 48.0, "Strait of Juan de Fuca", cex = 0.8, col = "Black") #this is your large open blue area unlabeled on Tableau Map

Visualizing the Nutrient and Microbial Concentrations on the Map

Plotting the nutrient data on the map with the latitude and longitude shows us the concentrations of each throughout the stations

par(mfrow=c(1,2))
plot(site, asp = 1, main = "NO3 Concentrations by Site", pch = 21, col = "black", bg = "cornflowerblue", cex = 5*mic1$NO3/max(mic1$NO3),xlab = "Longitude", ylab = "Latitude", family = "Times")
##family = "Times" means the font is written in Times New Roman, this is far better than base fonts for this graph

plot(site, asp = 1, main = "NO2 Concentrations by Site", pch = 21, col = "black", bg = "coral3", cex = 5*mic1$NO2/max(mic1$NO2), xlab = "Longitude", ylab = "Latitude", family = "Times")

plot(site, asp = 1, main = "NH4 Concentrations by Site", pch = 21, col = "black", bg = "darkolivegreen3", cex = 5*mic1$NH4/max(mic1$NH4), xlab = "Longitude", ylab = "Latitude", family = "Times")


plot(site, asp = 1, main = "PO4 Concentrations by Site", pch = 21, col = "black", bg = "orchid", cex = 5*mic1$PO4/max(mic1$PO4), xlab = "Longitude", ylab = "Latitude", family = "Times")

plot(site, asp = 1, main = "SiOH4 Concentrations by Site", pch = 21, col = "black", bg = "slateblue", cex = 5*mic1$SiOH4/max(mic1$SiOH4), xlab = "Longitude", ylab = "Latitude", family = "Times")

Looking for patterns in Nutrients at Sites:

There are some larger concentrations of NH~4 in the Olympic and Bremerton Area and severely lacking concentrations in the inlet leading toward Shelton.

There are also some high concentrations of PO4 from Everett toward Bellingham.

NO~3 , NO~2 , and NH~4 don’t appear to have anything super significant popping out.

par(mfrow=c(1,2))

plot(site, asp = 1, main = "Low Nucleic Acid Bacteria", pch = 21, col = "black", bg = "cyan4", cex = 5*mic1$low_nucleic_acid_bacteria/max(mic1$low_nucleic_acid_bacteria), xlab = "Longitude", ylab = "Latitude", family = "Times")

plot(site, asp = 1, main = "High Nucleic Acid Bacteria", pch = 21, col = "black", bg = "darkorange", cex = 5*mic1$high_nucleic_acid_bacteria/max(mic1$high_nucleic_acid_bacteria), xlab = "Longitude", ylab = "Latitude", family = "Times")

plot(site, asp = 1, main = "Total Bacteria By Site", pch = 21, col = "black", bg = "darkorchid4", cex = 5*mic1$total_bacteria/max(mic1$total_bacteria),xlab = "Longitude", ylab = "Latitude", family = "Times")

Looking for Patterns in Bacteria and Nutrients at Sites:

All 3 graphs show that the same area of High PO~4 Concentrations are low bacterial concentrations

Low Nucleic Acid Bacteria have a pattern near the Shoreline of in the inlet leading toward Shelton.

High Nucleic Acid Bacteria appear to be concentrated near Olympia.

Transforming our Data

Use the Hellinger transformation to obtain eigan values for an RDA and try to normalize our data.

library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-7
bac.hel <- bac.hel <- decostand(bac, "hellinger")

head(bac)
## # A tibble: 6 × 3
##   total_bacteria low_nucleic_acid_bacteria high_nucleic_acid_bacteria
##            <dbl>                     <dbl>                      <dbl>
## 1        488326.                    37494.                    450832.
## 2             0                         0                          0 
## 3       2123498.                   261850.                   1861648.
## 4       1987126.                   472888.                   1514238.
## 5       3173723.                   293880.                   2879843.
## 6        470499.                    42698.                    427801.
head(bac.hel)
##   total_bacteria low_nucleic_acid_bacteria high_nucleic_acid_bacteria
## 1      0.7071068                 0.1959348                  0.6794186
## 2      0.0000000                 0.0000000                  0.0000000
## 3      0.7071068                 0.2483048                  0.6620761
## 4      0.7071068                 0.3449464                  0.6172617
## 5      0.7071068                 0.2151719                  0.6735734
## 6      0.7071068                 0.2130139                  0.6742589
source("evplot.R")
source("hcoplot.R")

bac.rda <- rda(bac.hel ~ ., nut)
##summary(bac.rda) The summary is hidden due to the large output important results below

R2adj <- RsquareAdj(bac.rda)$adj.r.squared
R2adj
## [1] 0.1952862

What does this mean?

This analysis yielded 3 canonical axes with eigenvalues labelled RDA1 to RDA3. and 3 additional, unconstrained axes for the residuals with eigenvalues labelled PC1 to PC3. The cumulative contribution to the variance obtained by the 3 canonical axes is R= 0.204 and of that RDA1 makes up almost the entirety of the variance at 19.452%. The Adjusted R = 0.1952862 or 19.529% of the data is explained by the 3 canonical axes.

The triplot shown below show that NO3, SiOH4, and NH4 play a significant role in the dispersion of the sites along the first axis.

plot(bac.rda, scaling = 1, main = "Triplot RDA of bac.hel ~ nut - scaling 1", family="Times" )

bac.sc <- scores(bac.rda, choices = 1:2, scaling = 1, display = "sp")
arrows(0, 0, bac.sc[,1], bac.sc[,2], length = 0, lty = 1, col = "red")

##I am for some reason unable to get which bacteria is which, and site numbers but would be crowded regardless

Thank you